Search Results: "marco"

13 September 2021

Bits from Debian: New Debian Developers and Maintainers (July and August 2021)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

25 July 2021

Marco d'Itri: Run an Ansible playbook in a chroot

Running a playbook in a remote chroot or container is not supported by Ansible, but I have invented a good workaround to do it anyway. The first step is to install Mitogen for Ansible (ansible-mitogen in Debian) and then configure ansible.cfg to use it:
[defaults]
strategy = mitogen_linear
But everybody should use Mitogen anyway, because it makes Ansible much faster. The trick to have Ansible operate in a chroot is to make it call a wrapper script instead of Python. The wrapper can be created manually or by another playbook, e.g.:
  vars:
  - fsroot: /mnt
  tasks:
  - name: Create the chroot wrapper
    copy:
      dest: "/usr/local/sbin/chroot_ inventory_hostname_short "
      mode: 0755
      content:  
        #!/bin/sh -e
        exec chroot  fsroot  /usr/bin/python3 "$@"
  - name: Continue with stage 2 inside the chroot
    debug:
      msg:
        - "Please run:"
        - "ansible-playbook therealplaybook.yaml -l  inventory_hostname  -e ansible_python_interpreter=/usr/local/sbin/chroot_ inventory_hostname_short "
This works thanks to Mitogen, which funnels all remote tasks inside that single call to Python. It would not work with standard Ansible, because it copies files to the remote system with SFTP and would do it outside of the chroot. The same principle can also be applied to containers by changing wrapper script, e.g:
#!/bin/sh -e
exec systemd-run --quiet --pipe --machine= container_name  --service-type=exec /usr/bin/python3 "$@"
After the wrapper will have been installed then you can run the real playbook by setting the ansible_python_interpreter variable, either on the command line, in the inventory or anywhere else that variables can be defined:
ansible-playbook therealplaybook.yaml -l  inventory_hostname  -e ansible_python_interpreter=/usr/local/sbin/chroot_ inventory_hostname_short 

29 June 2021

Antoine Beaupr : Another syncmaildir crash

So I had another major email crash with my syncmaildir setup. This time I was at least able to confirm the issue, and I still haven't lost mail thanks to backups and sheer luck (again).

The crash It is not really worth going over the crash in details, it's fairly similar to the last one: something bad happened and smd started destroying everything. The hint is that it takes a long time to do what usually takes seconds. It helps that I now have a second monitor showing logs. I still lost much more mail than the last time. I used to have "301 723 messages", according to notmuch. But then when I ran smd-pull by hand, it was telling me:
95K emails scanned
Oops. You can see notmuch happily noticing the destroyed files on the server:
jun 28 16:33:40 marcos notmuch[28532]: No new mail. Removed 65498 messages. Detected 1699 file renames.
jun 28 16:36:05 marcos notmuch[29746]: No new mail. Removed 68883 messages. Detected 2488 file renames.
jun 28 16:41:40 marcos notmuch[31972]: No new mail. Removed 118295 messages. Detected 3657 file renames.
The final count ended up being 81 042 messages, according to notmuch. A whopping 220 000 mails deleted. The interesting bit, this time around, is that I caught smd in the act of running two processes in parallel:
jun 28 16:30:09 curie systemd[2845]: Finished pull emails with syncmaildir. 
jun 28 16:30:09 curie systemd[2845]: Starting push emails with syncmaildir... 
jun 28 16:30:09 curie systemd[2845]: Starting pull emails with syncmaildir... 
So clearly that is the source of the bug.

Recovery Emergency stop on curie:
notmuch dump > notmuch.dump
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
On marcos (the server), guessed the number of messages delivered since the last backup to be 71, just looking at timestamps in the mail log. Made a list:
grep postfix/local /var/log/mail.log   tail -71 > lost-mail
Found postfix queue IDs:
sed 's/.*\]://;s/:.*//' lost-mail > qids
Turn those into message IDs, find those that are missing from the disk (had previously ran notmuch new just to be sure it's up to date):
while read qid ; do 
    grep "$qid: message-id" /var/log/mail.log
done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do
    sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid
done
Copy this back on curie as missing-msgids and:
$ wc -l missing-msgids 
48 missing-msgids
$ while read msgid ; do notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done < missing-msgids
mailman.189.1624881611.23397.nodes-reseaulibre.ca@reseaulibre.ca
AnwMy7rdSpK-N-vt4AiOag@ismtpd0148p1mdw1.sendgrid.net
only two mails missing! whoohoo! Copy those back onto marcos as really-missing-msgids, and look at the full mail logs to see what they are:
~anarcat/src/koumbit-scripts/mail/postfix-trace --from-file really-missing-msgids2
I actually remembered deleting those, so no mail lost! Rebuild the list of msgids that were lost, on marcos:
while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'
Copy that on curie as lost-mail-msgids, then copy the files over in a test dir:
while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:restore/Maildir-angela/
If that looks about right, on marcos:
find restore/Maildir-angela/ -type f   wc -l
... should match the number of missing mails, roughly. Copy if in the real spool:
while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:Maildir/
Then on the server, notmuch new should find the new emails, and we shouldn't have any lost mail anymore:
while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done
Then, crucial moment, try to pull the new mails from the backups on curie:
anarcat@curie:~(main)$ smd-pull  -n  --show-tags -v
Found lockfile of a dead instance. Ignored.
Phase 0: handshake
Phase 1: changes detection
    5K emails scanned
   10K emails scanned
   15K emails scanned
   20K emails scanned
   25K emails scanned
   30K emails scanned
   35K emails scanned
   40K emails scanned
   45K emails scanned
   50K emails scanned
Phase 2: synchronization
Phase 3: agreement
default: smd-client@localhost: TAGS: stats::new-mails(49687), del-mails(0), bytes-received(215752279), xdelta-received(3703852)
"smd-pull  -n  --show-tags -v" took 3 mins 39 secs
This brought me back to the state after the backup plus the mails delivered during the day, which means I had to catchup with all my holiday's read emails (1440 mails!) but thankfully I made a dump of the notmuch database on curie at the start of the procedure, so this actually restored a sane state:
pv notmuch.dump   notmuch restore
Phew!

Workaround I have filed this as a bug in upstream issue 18. Considering I filed 11 issues and only 3 of those were closed, I'm not holding my breath. I nevertheless filed PR 19 in the hope that this will fix my particular issue, but I'm not even sure this is the right fix...

Fix At this point, I'm really ready to give up on SMD. It's really, really nice to be able to sync mail over SSH because I don't need to store my IMAP password on disk. But surely there are more reliable syncing mechanisms. I do not remember ever losing that much mail before. At worst, offlineimap would duplicate emails like mad, but never destroy my entire mail spool that way. As mentioned before, there are other programs that sync mail. I'm looking at:
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, see comment below
  • isync/mbsync: might be faster, I remember having trouble switching from offlineimap to this, has support for TLS client certs, running over SSH, and generally has good words from multiple Debian and notmuch people
  • getmail: just downloads email, might not be enough
  • nncp: treat the local spool as another mail server, might not be compatible with my "multiple clients" setup
  • doveadm-sync: requires dovecot on both ends, but supports using SSH to sync, will try this next, may have performance problems, see comment below
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, seems awfully complicated to setup, mentions rsmtp which is a nice name for rsendmail

1 June 2021

Paul Wise: FLOSS Activities May 2021

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian wiki: unblock IP addresses, approve accounts

Communication
  • Joined the great IRC migration
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The purple-discord, sptag and esprima-python work was sponsored by my employer. All other work was done on a volunteer basis.

19 May 2021

Marco d'Itri: My resignation from freenode

As it is now known, the freenode IRC network has been taken over by a Trumpian wannabe korean royalty bitcoins millionaire. To make a long story short, the former freenode head of staff secretly "sold" the network to this person even if it was not hers to sell, and our lawyers have advised us that there is not much that we can do about it without some of us risking financial ruin. Fuck you Christel, lilo's life work did not deserve this. What you knew as freenode after 12:00 UTC of May 19 will be managed by different people. As I have no desire to volunteer under the new regime, this marks the end of my involvement with freenode. It had started in 1999 when I encouraged the good parts of #linux-it to leave ircnet, and soon after I became senior staff. Even if I have not been very active recently, at this point I was the longest-serving freenode staff member and now I expect that I will hold this record forever. The people that I have met on IRC, on freenode and other networks, have been and still are a very important part of my life, second only to the ones that I have known thanks to Usenet. I am not fine, but I know that the communities which I have been a part of are not defined by a domain name and will regroup somewhere else. The current freenode staff members have resigned with me, these are some of their farewell messages:
  • amdj
  • edk
  • emilsp
  • Fuchs
  • jess
  • JonathanD
  • kline
  • niko
  • mniip
  • Swant
  • Together we have created Libera.Chat, a new IRC network based on the same principles of the old freenode.

    1 April 2021

    Paul Wise: FLOSS Activities March 2021

    Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

    Changes

    Issues

    Debugging

    Review

    Administration
    • Debian packages: migrate flower git repo from alioth-archive to salsa
    • Debian: restart bacula-director after PostgreSQL restart
    • Debian wiki: block spammer, clean up spam, approve accounts

    Communication

    Sponsors The librecaptcha/libpst/flower/marco work was sponsored by my employers. All other work was done on a volunteer basis.

    23 March 2021

    Antoine Beaupr : Major email crash with syncmaildir

    TL:DR; lost half my mail (150,000 messages, ~6GB) last night. Cause uncertain, but possibly a combination of a dead CMOS battery, systemd OnCalendar=daily, a (locking?) bug in syncmaildir, and generally, a system too exotic and complicated.

    The crash So I somehow lost half my mail:
    anarcat@angela:~(main)$ du -sh Maildir/
    7,9G    Maildir/
    anarcat@curie:~(main)$ du -sh Maildir
    14G     Maildir
    anarcat@marcos:~$ du -sh Maildir
    8,0G    Maildir
    
    Those are three different machines:
    • angela: my laptop, not always on
    • curie: my workstation, mostly always on
    • marcos: my mail server, always on
    Those mails are synchronized using a rather exotic system based on SSH, syncmaildir and rsendmail. The anomaly started on curie:
    -- Reboot --
    mar 22 16:13:00 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:13:00 curie smd-pull[4801]: rm: impossible de supprimer '/home/anarcat/.smd/workarea/Maildir': Le dossier n'est pas vide
    mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
    mar 22 16:13:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
    mar 22 16:14:00 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:14:00 curie smd-pull[7025]:  4091 ?        00:00:00 smd-push
    mar 22 16:14:00 curie smd-pull[7025]: Already running.
    mar 22 16:14:00 curie smd-pull[7025]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
    mar 22 16:14:00 curie smd-pull[7025]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 4091) run(rm /home/anarcat/.smd/lock))
    mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
    mar 22 16:14:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
    
    Then it seems like smd-push (from curie) started destroying the universe for some reason:
    mar 22 16:20:00 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:20:00 curie smd-pull[9319]:  4091 ?        00:00:00 smd-push
    mar 22 16:20:00 curie smd-pull[9319]: Already running.
    mar 22 16:20:00 curie smd-pull[9319]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
    mar 22 16:20:00 curie smd-pull[9319]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(ru
    mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
    mar 22 16:20:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
    mar 22 16:21:34 curie smd-push[4091]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
    mar 22 16:21:35 curie smd-push[9374]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
    mar 22 16:21:35 curie systemd[3199]: smd-push.service: Succeeded.
    
    Notice the del-mails(293920) there: it is actively trying to destroy basically every email in my mail spool. Then somehow push and pull started both at once:
    mar 22 16:21:35 curie systemd[3199]: Started push emails with syncmaildir.
    mar 22 16:21:35 curie systemd[3199]: Starting push emails with syncmaildir...
    mar 22 16:22:00 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:22:00 curie smd-pull[10333]:  9455 ?        00:00:00 smd-push
    mar 22 16:22:00 curie smd-pull[10333]: Already running.
    mar 22 16:22:00 curie smd-pull[10333]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
    mar 22 16:22:00 curie smd-pull[10333]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(r
    mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
    mar 22 16:22:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
    mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: Data transmission failed.
    mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: This problem is transient, please retry.
    mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: server sent ABORT or connection died
    mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marco
    mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: The problem should be transient, please retry.
    mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open requested file.
    mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
    mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
    mar 22 16:22:00 curie smd-push[9455]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(r
    mar 22 16:22:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:22:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
    mar 22 16:22:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
    
    There it seems push tried to destroy the universe again: del-mails(293920). Interestingly, the push started again in parallel with the pull, right that minute:
    mar 22 16:22:00 curie systemd[3199]: Starting push emails with syncmaildir...
    
    ... but didn't complete for a while, here's pull trying to start again:
    mar 22 16:24:00 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:24:00 curie smd-pull[12051]: 10466 ?        00:00:00 smd-push
    mar 22 16:24:00 curie smd-pull[12051]: Already running.
    mar 22 16:24:00 curie smd-pull[12051]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
    mar 22 16:24:00 curie smd-pull[12051]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 10466) run(rm /home/anarcat/.smd/lock))
    mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
    mar 22 16:24:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
    
    ... and the long push finally resolving:
    mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
    mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
    mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
    mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
    mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
    mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
    mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: No such file or directory
    mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: The problem should be transient, please retry.
    mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open requested file.
    mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
    mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
    mar 22 16:24:00 curie smd-push[10466]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(retry)
    mar 22 16:24:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:24:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
    mar 22 16:24:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
    mar 22 16:24:00 curie systemd[3199]: Starting push emails with syncmaildir...
    
    This pattern repeats until 16:35, when that locking issue silently recovered somehow:
    mar 22 16:35:03 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:35:41 curie smd-pull[20788]: default: smd-client@localhost: TAGS: stats::new-mails(5), del-mails(1), bytes-received(21885), xdelta-received(6863398)
    mar 22 16:35:42 curie smd-pull[21373]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
    mar 22 16:35:42 curie systemd[3199]: smd-pull.service: Succeeded.
    mar 22 16:35:42 curie systemd[3199]: Started pull emails with syncmaildir.
    mar 22 16:36:35 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:36:36 curie smd-pull[21738]: default: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(214)
    mar 22 16:36:37 curie smd-pull[21816]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
    mar 22 16:36:37 curie systemd[3199]: smd-pull.service: Succeeded.
    mar 22 16:36:37 curie systemd[3199]: Started pull emails with syncmaildir.
    
    ... notice that huge xdelta-received there, that's 7GB right there. Mysteriously, the curie mail spool survived this, possibly because smd-pull started failing again:
    mar 22 16:38:00 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:38:00 curie smd-pull[23556]: 21887 ?        00:00:00 smd-push
    mar 22 16:38:00 curie smd-pull[23556]: Already running.
    mar 22 16:38:00 curie smd-pull[23556]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
    mar 22 16:38:00 curie smd-pull[23556]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 21887) run(rm /home/anarcat/.smd/lock))
    mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
    mar 22 16:38:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
    
    That could have been when i got on angela to check my mail, and it was busy doing the nasty removal stuff... although the times don't match. Here is when angela came back online:
    anarcat@angela:~(main)$ last
    anarcat  :0           :0               Mon Mar 22 19:57   still logged in
    reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 19:57   still running
    anarcat  :0           :0               Mon Mar 22 17:43 - 18:47  (01:03)
    reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 17:39   still running
    
    Then finally the sync on curie started failing with:
    mar 22 16:46:35 curie systemd[3199]: Starting pull emails with syncmaildir...
    mar 22 16:46:42 curie smd-pull[27455]: smd-server: ERROR: Client aborted, removing /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.new and /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.mtime.new
    mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: Failed to copy Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
    mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: The destination already exists but its content differs.
    mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: To fix this problem you have two options:
    mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - rename Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S by hand so that Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S
    mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   can be copied without replacing it.
    mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   Executing  cd; mv -n "Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "Maildir/.koumbit/cur/1616446002.1.localhost"  should work.
    mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - run smd-push so that your changes to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
    mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   are propagated to the other mailbox
    mar 22 16:46:42 curie smd-pull[27455]: default: smd-client@localhost: TAGS: error::context(copy-message) probable-cause(concurrent-mailbox-edit) human-intervention(necessary) suggested-actions(run(mv -n "/home/anarcat/.smd/workarea/Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "/home/anarcat/.smd/workarea/Maildir/.koumbit/tmp/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S") run(smd-push default))
    mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
    mar 22 16:46:42 curie systemd[3199]: Failed to start pull emails with syncmaildir.
    
    It went on like this until I found the problem. This is, presumably, a good thing because those emails were not being destroyed. On angela, things looked like this:
    -- Reboot --
    mar 22 17:39:29 angela systemd[1677]: Started run notmuch new at least once a day.
    mar 22 17:39:29 angela systemd[1677]: Started run smd-pull regularly.
    mar 22 17:40:46 angela systemd[1677]: Starting pull emails with syncmaildir...
    mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open Maildir/.tor/new/1616446842.M285912P26118.marcos,S=8860,W=8996: Maildir/.tor/new/1616446842.M285912P26118.marcos,S=886
    0,W=8996: No such file or directory
    mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: The problem should be transient, please retry.
    mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open requested file.
    mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: Data transmission failed.
    mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: This problem is transient, please retry.
    mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: server sent ABORT or connection died
    mar 22 17:43:18 angela smd-pull[3916]: default: smd-server@smd-server-anarcat: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested
    -actions(retry)
    mar 22 17:43:18 angela smd-pull[3916]: default: smd-client@localhost: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
    mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
    mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Failed with result 'exit-code'.
    mar 22 17:43:18 angela systemd[1677]: Failed to start pull emails with syncmaildir.
    mar 22 17:43:18 angela systemd[1677]: Starting pull emails with syncmaildir...
    mar 22 17:43:29 angela smd-pull[4847]: default: smd-client@localhost: TAGS: stats::new-mails(29), del-mails(0), bytes-received(401519), xdelta-received(38914)
    mar 22 17:43:29 angela smd-pull[5600]: register: smd-client@localhost: TAGS: stats::new-mails(2), del-mails(0), bytes-received(92150), xdelta-received(471)
    mar 22 17:43:29 angela systemd[1677]: smd-pull.service: Succeeded.
    mar 22 17:43:29 angela systemd[1677]: Started pull emails with syncmaildir.
    mar 22 17:43:29 angela systemd[1677]: Starting push emails with syncmaildir...
    mar 22 17:43:32 angela smd-push[5693]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(217)
    mar 22 17:43:33 angela smd-push[6575]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(219)
    mar 22 17:43:33 angela systemd[1677]: smd-push.service: Succeeded.
    mar 22 17:43:33 angela systemd[1677]: Started push emails with syncmaildir.
    
    Notice how long it took to get the first error, in that first failure: it failed after 3 minutes! Presumably that's when it started deleting all that mail. And this is during pull, not push, so the error didn't come from angela.

    Affected data It seems 2GB of mail from my main INBOX was destroyed. Another 2.4GB of spam (kept for training purposes) was also destroyed, along with 700MB of Sent mail. The rest is hard to figure out, because the folders are actually still there, just smaller. So I relied on ncdu to figure out the size changes. (Note that I don't really archive (or delete much of) my mail since I use notmuch, which is why the INBOX is so large...) Concretely, according to the notmuch-new.service which still runs periodically on marcos, here are the changes that happened on the server:
    mar 22 16:17:12 marcos notmuch[10729]: Added 7 new messages to the database. Removed 57985 messages. Detected 1372 file renames.
    mar 22 16:22:43 marcos notmuch[12826]: No new mail. Removed 143842 messages. Detected 6072 file renames.
    mar 22 16:27:02 marcos notmuch[13969]: No new mail. Removed 82071 messages. Detected 1783 file renames.
    mar 22 16:29:45 marcos notmuch[15079]: Added 22743 new messages to the database. Detected 1 file rename.
    mar 22 16:31:48 marcos notmuch[16196]: Added 22779 new messages to the database. Removed 5 messages.
    mar 22 16:33:11 marcos notmuch[17192]: Added 3711 new messages to the database.
    mar 22 16:40:41 marcos notmuch[19122]: Added 74558 new messages to the database. Detected 1 file rename.
    mar 22 16:43:21 marcos notmuch[20325]: Added 9061 new messages to the database. Detected 4 file renames.
    mar 22 17:43:08 marcos notmuch[7420]: Added 1793 new messages to the database. Detected 6 file renames.
    
    That is basically the entire mail spool destroyed at first (283 898 messages), and then bits and pieces of it progressively re-added (134 645 messages), somehow, so 149 253 mails were lost, presumably.

    Recovery I disabled the services all over the place:
    systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
    
    (Well, technically, I did that only on angela, as I thought the problem was there. Luckily, curie kept going but it seems like it was harmless.) I made a backup of the mail spool on curie:
    tar cf - Maildir/   pv -s 14G   gzip -c > Maildir.tgz
    
    Then I crossed my fingers and ran smd-push -v -s, as that was suggested by smd error codes themselves. That thankfully started restoring mail. It failed a few times on weird cases of files being duplicates, but I resolved this by following the instructions. Or mostly: I actually deleted the files instead of moving them, which made smd even unhappier (if there ever was such a thing). I had to recreate some of those files, so, lesson learned: do follow the advice smd gives you, even if it seems useless or strange. But then smd-push was humming along, uploading tens of thousands of messages, saturating the upload in the office, refilling the mail spool on the server... yaay!... ? Except... well, of course that didn't quite work: the mail spool in the office eventually started to grow beyond the size of the mail spool on the workstation. That is what smd-push eventually settled on:
    default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
    default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
    default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(151697), del-mails(0), bytes-received(7539147811), xdelta-received(10881198)
    
    It recreated 151 697 emails, adding about 2000 emails to the pool, kind of from nowhere at all. On marcos, before:
    ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
    --- /home/anarcat/Maildir ------------------------------------
        4,0 GiB [##########] /.notmuch
      717,3 MiB [#         ] /.Archives.2014
      498,2 MiB [#         ] /.feeds.debian-planet
      453,1 MiB [#         ] /.Archives.2012
      414,5 MiB [#         ] /.debian
      408,2 MiB [#         ] /.quoifaire
      389,8 MiB [          ] /.rapports
      356,6 MiB [          ] /.tor
      182,6 MiB [          ] /.koumbit
      179,8 MiB [          ] /tmp
       56,8 MiB [          ] /.nn
       43,0 MiB [          ] /.act-mtl
       32,6 MiB [          ] /.feeds.sysadvent
       31,7 MiB [          ] /.feeds.releases
       31,4 MiB [          ] /.Sent.2005
       26,3 MiB [          ] /.sage
       25,5 MiB [          ] /.freedombox
       24,0 MiB [          ] /.feeds.git-annex
       21,1 MiB [          ] /.Archives.2011
       19,1 MiB [          ] /.Sent.2003
       16,7 MiB [          ] /.bugtraq
       16,2 MiB [          ] /.mlug
     Total disk usage:   8,0 GiB  Apparent size:   7,6 GiB  Items: 184426
    
    After:
    ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
    --- /home/anarcat/Maildir ------------------------------------
        4,7 GiB [##########] /.notmuch
        2,7 GiB [#####     ] /.junk
        1,9 GiB [###       ] /cur
      717,3 MiB [#         ] /.Archives.2014
      659,3 MiB [#         ] /.Sent
      513,9 MiB [#         ] /.Archives.2012
      498,2 MiB [#         ] /.feeds.debian-planet
      449,6 MiB [          ] /.Archives.2015
      414,5 MiB [          ] /.debian
      408,2 MiB [          ] /.quoifaire
      389,8 MiB [          ] /.rapports
      380,8 MiB [          ] /.Archives.2013
      356,6 MiB [          ] /.tor
      261,1 MiB [          ] /.Archives.2011
      240,9 MiB [          ] /.koumbit
      183,6 MiB [          ] /.Archives.2010
      179,8 MiB [          ] /tmp
      128,4 MiB [          ] /.lists
      106,1 MiB [          ] /.inso-interne
      103,0 MiB [          ] /.github
       75,0 MiB [          ] /.nanog
       69,8 MiB [          ] /.full-disclosure
     Total disk usage:  16,2 GiB  Apparent size:  15,5 GiB  Items: 341143
    
    That is 156 717 files more. On curie:
    ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
    --- /home/anarcat/Maildir ------------------------------------------------------------------
        2,7 GiB [##########] /.junk
        2,3 GiB [########  ] /.notmuch
        1,9 GiB [######    ] /cur
      661,2 MiB [##        ] /.Archives.2014
      655,3 MiB [##        ] /.Sent
      512,0 MiB [#         ] /.Archives.2012
      447,3 MiB [#         ] /.Archives.2015
      438,5 MiB [#         ] /.feeds.debian-planet
      406,5 MiB [#         ] /.quoifaire
      383,6 MiB [#         ] /.debian
      378,6 MiB [#         ] /.Archives.2013
      303,3 MiB [#         ] /.tor
      296,0 MiB [#         ] /.rapports
      237,6 MiB [          ] /.koumbit
      233,2 MiB [          ] /.Archives.2011
      182,1 MiB [          ] /.Archives.2010
      127,0 MiB [          ] /.lists
      104,8 MiB [          ] /.inso-interne
      102,7 MiB [          ] /.register
       89,6 MiB [          ] /.github
       67,1 MiB [          ] /.full-disclosure
       66,5 MiB [          ] /.nanog
     Total disk usage:  13,3 GiB  Apparent size:  12,6 GiB  Items: 342465
    
    Interestingly, there are more files, but less disk usage. It's possible the notmuch database there is more efficient. So maybe there's nothing to worry about. Last night's marcos backup has:
    root@marcos:/home/anarcat# find /mnt/home/anarcat/Maildir   pv -l   wc -l
     341k 0:00:16 [20,4k/s] [                             <=>                                                                                                                                     ]
    341040
    
    ... 341040 files, which seems about right, considering some mail was delivered during the day. An audit can be performed with hashdeep:
    borg mount /media/sdb2/borg/::marcos-auto-2021-03-22 /mnt
    hashdeep -c sha256 -r /mnt/home/anarcat/Maildir   pv -l -s 341k > Maildir-backup-manifest.txt
    
    And then compared with:
    hashdeep -c sha256 -k Maildir-backup-manifest.txt Maildir/
    
    Some extra files should show up in the Maildir, and very few should actually be missing, because I shouldn't have deleted mail from the previous day the next day, or at least very few. The actual summary hashdeep gave me was:
    hashdeep: Audit failed
       Input files examined: 0
      Known files expecting: 0
              Files matched: 339080
    Files partially matched: 0
                Files moved: 782
            New files found: 107
      Known files not found: 106
    
    So 106 files added, 107 deleted. Seems good enough for me... Postfix was stopped at Mar 22 21:12:59 to try and stop external events from confusing things even further. I reviewed the delivery log to see if mail that came in during the problem window disappeared:
    grep 'dovecot:.*stored mail into mailbox' /var/log/mail.log  
      tail -20  
      sed 's/.*msgid=<//;s/>.*//'   
      while read msgid; do 
        notmuch count --exclude=false id:$msgid  
          grep 0 && echo $msgid missing;
      done
    
    And things looked okay. Now of course if we go further back, we find mail I actually deleted (because I do do that sometimes), so it's hard to use this log as an audit trail. We can only hope that the curie spool is sufficiently coherent to be relied on. Worst case, we'll have to restore from last night's backup, but that's getting far away now: I get hundreds of mails a day in that mail spool, and reseting back to last night does not seem like a good idea. A dry run of smd-pull on angela seems to agree that it's missing some files:
    default: smd-client@localhost: TAGS: stats::new-mails(154914), del-mails(0), bytes-received(0), xdelta-received(0)
    
    ... a number of mails somewhere in between the other two, go figure. A "wet" run of this was started, without deletion (-n), which gave us:
    default: smd-client@localhost: TAGS: stats::new-mails(154911), del-mails(0), bytes-received(7658160107), xdelta-received(10837609)
    
    Strange that it sync'd three less emails, but that's still better than nothing, and we have a mail spool on angela again:
    anarcat@angela:~(main)$ notmuch new
    purging with prefix '.': spam moved (0), ham moved (0), deleted (0), done
    Note: Ignoring non-mail file: /home/anarcat/Maildir//.uidvalidity
    Processed 1779 total files in 26s (66 files/sec.).
    Added 1190 new messages to the database. Removed 3 messages. Detected 593 file renames.
    tagging with prefix '.': spam, sent, feeds, koumbit, tor, lists, rapports, folders, done.
    
    Notice how only 1190 messages were re-added, that is because I killed notmuch before it had time to remove all those mails from its database.

    Possible causes I am totally at a loss as to why smd started destroying everything like it did. But a few things come to mind:
    1. I rewired my office on that day.
    2. This meant unplugging curie, the workstation.
    3. It has a bad CMOS battery (known problem), so it jumped around the time continuum a few times, sometimes by years.
    4. The smd services are ran from a systemd unit with OnCalendar=*:0/2. I have heard that it's possible that major time jumps "pile up" execution of jobs, and it seems this happened in this case.
    5. It's possible that locking in smd is not as great as it could be, and that it corrupted its internal data structures on curie, which led it to command a destruction of the remote mail spool.
    It's also possible that there was a disk failure on the server, marcos. But since it's running on a (software) RAID-1 array, and no errors have been found (according to dmesg), I don't think that's a plausible hypothesis.

    Lessons learned
    1. follow what smd says, even if it seems useless or strange.
    2. trust but verify: just backup everything before you do anything, especially the largest data set.
    3. daily backups are not great for email, unless you're ready to lose a day of email (which I'm not).
    4. hashdeep is great. I keep finding new use cases for it. Last time it was to audit my camera SD card to make sure I didn't forget anything, and now this. it's fast and powerful.
    5. borg is great too. the FUSE mount was especially useful, and it was pretty fast to explore the backup, even through that overhead: checksumming 15GB of mail took about 35 minutes, which gives a respectable 8MB/s, probably bottlenecked by the crap external USB drive I use for backups (!).
    6. I really need to finish my backup system so that I have automated offsite backups, although in this case that would actually have been much slower (certainly not 8MB/s!).

    Workarounds and solutions I setup fake-hwclock on curie, so that the next power failure will not upset my clock that badly. I am thinking of switching to ZFS or BTRFS for most of my filesystems, so that I can use filesystem snapshots (including remotely!) as a backup strategy. This seems so much more powerful than crawling the filesystem for changes, and allows for truly offsite backups protected from an attacker (hopefully). But it's a long way there. I'm also thinking of rebuilding my mail setup without smd. It's not the first time something like this happens with smd. It's the first time I am more confident it's the root cause of the problem, however, and it makes me really nervous for the future. I have used offlineimap in the past and it seems it was finally ported to Python 3 so that could be an option again. isync/mbsync is another option, which I tried before but do not remember why I didn't switch. A complete redesign with something like getmail and/or nncp could also be an option. But alas, I lack the time to go crazy with those experiments. Somehow, doing like everyone else and just going with Google still doesn't seem to be an option for me. Screw big tech. But I am afraid they will win, eventually. In any case, I'm just happy I got mail again, strangely.

    9 February 2021

    Kees Cook: security things in Linux v5.8

    Previously: v5.7 Linux v5.8 was released in August, 2020. Here s my summary of various security things that caught my attention: arm64 Branch Target Identification
    Dave Martin added support for ARMv8.5 s Branch Target Instructions (BTI), which are enabled in userspace at execve() time, and all the time in the kernel (which required manually marking up a lot of non-C code, like assembly and JIT code). With this in place, Jump-Oriented Programming (JOP, where code gadgets are chained together with jumps and calls) is no longer available to the attacker. An attacker s code must make direct function calls. This basically reduces the usable code available to an attacker from every word in the kernel text to only function entries (or jump targets). This is a low granularity forward-edge Control Flow Integrity (CFI) feature, which is important (since it greatly reduces the potential targets that can be used in an attack) and cheap (implemented in hardware). It s a good first step to strong CFI, but (as we ve seen with things like CFG) it isn t usually strong enough to stop a motivated attacker. High granularity CFI (which uses a more specific branch-target characteristic, like function prototypes, to track expected call sites) is not yet a hardware supported feature, but the software version will be coming in the future by way of Clang s CFI implementation. arm64 Shadow Call Stack
    Sami Tolvanen landed the kernel implementation of Clang s Shadow Call Stack (SCS), which protects the kernel against Return-Oriented Programming (ROP) attacks (where code gadgets are chained together with returns). This backward-edge CFI protection is implemented by keeping a second dedicated stack pointer register (x18) and keeping a copy of the return addresses stored in a separate shadow stack . In this way, manipulating the regular stack s return addresses will have no effect. (And since a copy of the return address continues to live in the regular stack, no changes are needed for back trace dumps, etc.) It s worth noting that unlike BTI (which is hardware based), this is a software defense that relies on the location of the Shadow Stack (i.e. the value of x18) staying secret, since the memory could be written to directly. Intel s hardware ROP defense (CET) uses a hardware shadow stack that isn t directly writable. ARM s hardware defense against ROP is PAC (which is actually designed as an arbitrary CFI defense it can be used for forward-edge too), but that depends on having ARMv8.3 hardware. The expectation is that SCS will be used until PAC is available. Kernel Concurrency Sanitizer infrastructure added
    Marco Elver landed support for the Kernel Concurrency Sanitizer, which is a new debugging infrastructure to find data races in the kernel, via CONFIG_KCSAN. This immediately found real bugs, with some fixes having already landed too. For more details, see the KCSAN documentation. new capabilities
    Alexey Budankov added CAP_PERFMON, which is designed to allow access to perf(). The idea is that this capability gives a process access to only read aspects of the running kernel and system. No longer will access be needed through the much more powerful abilities of CAP_SYS_ADMIN, which has many ways to change kernel internals. This allows for a split between controls over the confidentiality (read access via CAP_PERFMON) of the kernel vs control over integrity (write access via CAP_SYS_ADMIN). Alexei Starovoitov added CAP_BPF, which is designed to separate BPF access from the all-powerful CAP_SYS_ADMIN. It is designed to be used in combination with CAP_PERFMON for tracing-like activities and CAP_NET_ADMIN for networking-related activities. For things that could change kernel integrity (i.e. write access), CAP_SYS_ADMIN is still required. network random number generator improvements
    Willy Tarreau made the network code s random number generator less predictable. This will further frustrate any attacker s attempts to recover the state of the RNG externally, which might lead to the ability to hijack network sessions (by correctly guessing packet states). fix various kernel address exposures to non-CAP_SYSLOG
    I fixed several situations where kernel addresses were still being exposed to unprivileged (i.e. non-CAP_SYSLOG) users, though usually only through odd corner cases. After refactoring how capabilities were being checked for files in /sys and /proc, the kernel modules sections, kprobes, and BPF exposures got fixed. (Though in doing so, I briefly made things much worse before getting it properly fixed. Yikes!) RISCV W^X detection
    Following up on his recent work to enable strict kernel memory protections on RISCV, Zong Li has now added support for CONFIG_DEBUG_WX as seen for other architectures. Any writable and executable memory regions in the kernel (which are lovely targets for attackers) will be loudly noted at boot so they can get corrected. execve() refactoring continues
    Eric W. Biederman continued working on execve() refactoring, including getting rid of the frequently problematic recursion used to locate binary handlers. I used the opportunity to dust off some old binfmt_script regression tests and get them into the kernel selftests. multiple /proc instances
    Alexey Gladkov modernized /proc internals and provided a way to have multiple /proc instances mounted in the same PID namespace. This allows for having multiple views of /proc, with different features enabled. (Including the newly added hidepid=4 and subset=pid mount options.) set_fs() removal continues
    Christoph Hellwig, with Eric W. Biederman, Arnd Bergmann, and others, have been diligently working to entirely remove the kernel s set_fs() interface, which has long been a source of security flaws due to weird confusions about which address space the kernel thought it should be accessing. Beyond things like the lower-level per-architecture signal handling code, this has needed to touch various parts of the ELF loader, and networking code too. READ_IMPLIES_EXEC is no more for native 64-bit
    The READ_IMPLIES_EXEC flag was a work-around for dealing with the addition of non-executable (NX) memory when x86_64 was introduced. It was designed as a way to mark a memory region as well, since we don t know if this memory region was expected to be executable, we must assume that if we need to read it, we need to be allowed to execute it too . It was designed mostly for stack memory (where trampoline code might live), but it would carry over into all mmap() allocations, which would mean sometimes exposing a large attack surface to an attacker looking to find executable memory. While normally this didn t cause problems on modern systems that correctly marked their ELF sections as NX, there were still some awkward corner-cases. I fixed this by splitting READ_IMPLIES_EXEC from the ELF PT_GNU_STACK marking on x86 and arm/arm64, and declaring that a native 64-bit process would never gain READ_IMPLIES_EXEC on x86_64 and arm64, which matches the behavior of other native 64-bit architectures that correctly didn t ever implement READ_IMPLIES_EXEC in the first place. array index bounds checking continues
    As part of the ongoing work to use modern flexible arrays in the kernel, Gustavo A. R. Silva added the flex_array_size() helper (as a cousin to struct_size()). The zero/one-member into flex array conversions continue with over a hundred commits as we slowly get closer to being able to build with -Warray-bounds. scnprintf() replacement continues
    Chen Zhou joined Takashi Iwai in continuing to replace potentially unsafe uses of sprintf() with scnprintf(). Fixing all of these will make sure the kernel avoids nasty buffer concatenation surprises. That s it for now! Let me know if there is anything else you think I should mention here. Next up: Linux v5.9.

    2021, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
    CC BY-SA 4.0

    29 January 2021

    Russ Allbery: INN 2.6.4

    INN 2.6.4 has been released. This is a bug fix and minor feature release over INN 2.6.3, and the upgrade should be painless. You can download the new release from ftp.isc.org or my personal INN pages. The latter also has links to the full changelog and the other INN documentation. As always, thanks to Julien LIE for preparing this release and doing most of the maintenance work on INN! Changes in this release:

    16 November 2020

    Bits from Debian: New Debian Developers and Maintainers (September and October 2020)

    The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

    26 October 2020

    Marco d'Itri: RPKI validation with FORT Validator

    This article documents how to install FORT Validator (an RPKI relying party software which also implements the RPKI to Router protocol in a single daemon) on Debian 10 to provide RPKI validation to routers. If you are using testing or unstable then you can just skip the part about apt pinnings. The packages in bullseye (Debian testing) can be installed as is on Debian stable with no need to rebuild them, by configuring an appropriate pinning for apt:
    cat <<END > /etc/apt/sources.list.d/bullseye.list
    deb http://deb.debian.org/debian/ bullseye main
    END
    cat <<END > /etc/apt/preferences.d/pin-rpki
    # by default do not install anything from bullseye
    Package: *
    Pin: release bullseye
    Pin-Priority: 100
    Package: fort-validator rpki-trust-anchors
    Pin: release bullseye
    Pin-Priority: 990
    END
    apt update
    
    Before starting, make sure that curl (or wget) and the web PKI certificates are installed:
    apt install curl ca-certificates
    
    If you already know about the legal issues related to the ARIN TAL then you may instruct the package to automatically install it. If you skip this step then you will be asked at installation time about it, either way is fine.
    echo 'rpki-trust-anchors rpki-trust-anchors/get_arin_tal boolean true' \
        debconf-set-selections
    
    Install the package as usual:
    apt install fort-validator
    
    You may also install rpki-client and gortr on Debian 10, or maybe cfrpki and gortr. I have also tried packaging Routinator 3000 for Debian, but this effort is currently on hold because the Rust ecosystem is broken and hostile to the good packaging practices of Linux distributions.

    Marco d'Itri: RPKI validation with OpenBSD's rpki-client and Cloudflare's gortr

    This article documents how to install rpki-client (an RPKI relying party software, the actual validator) and gortr (which implements the RPKI to Router protocol) on Debian 10 to provide RPKI validation to routers. If you are using testing or unstable then you can just skip the part about apt pinnings. The packages in bullseye (Debian testing) can be installed as is on Debian stable with no need to rebuild them, by configuring an appropriate pinning for apt:
    cat <<END > /etc/apt/sources.list.d/bullseye.list
    deb http://deb.debian.org/debian/ bullseye main
    END
    cat <<END > /etc/apt/preferences.d/pin-rpki
    # by default do not install anything from bullseye
    Package: *
    Pin: release bullseye
    Pin-Priority: 100
    Package: gortr rpki-client rpki-trust-anchors
    Pin: release bullseye
    Pin-Priority: 990
    END
    apt update
    
    Before starting, make sure that curl (or wget) and the web PKI certificates are installed:
    apt install curl ca-certificates
    
    If you already know about the legal issues related to the ARIN TAL then you may instruct the package to automatically install it. If you skip this step then you will be asked at installation time about it, either way is fine.
    echo 'rpki-trust-anchors rpki-trust-anchors/get_arin_tal boolean true' \
        debconf-set-selections
    
    Install the packages as usual:
    apt install rpki-client gortr
    
    And then configure rpki-client to generate its output in the the JSON format needed by gortr:
    echo 'OPTIONS=-j' > /etc/default/rpki-client
    
    You may manually start the service unit to immediately generate the data instead of waiting for the next timer run:
    systemctl start rpki-client &
    
    gortr too needs to be configured to use the JSON data generated by rpki-client:
    echo 'GORTR_ARGS=-bind :323 -verify=false -checktime=false -cache /var/lib/rpki-client/json' > /etc/default/gortr
    
    And then it needs to be restarted to use the new configuration:
    systemctl restart gortr
    
    You may also install FORT Validator on Debian 10, or maybe cfrpki with gortr. I have also tried packaging Routinator 3000 for Debian, but this effort is currently on hold because the Rust ecosystem is broken and hostile to the packaging practices of Linux distributions.

    21 July 2020

    Bits from Debian: New Debian Developers and Maintainers (May and June 2020)

    The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

    15 April 2020

    Antoine Beaupr : OpenDKIM configuration to send debian.org email

    Debian.org added support for DKIM in 2020. To configure this on my side, I had to do the following, on top of my email configuration.
    1. add this line to /etc/opendkim/signing.table:
      *@debian.org marcos-debian.anarcat.user
      
    2. add this line to /etc/opendkim/key.table:
      marcos-debian.anarcat.user debian.org:marcos-debian.anarcat.user:/etc/opendkim/keys/marcos-debian.anarcat.user.private
      
      Yes, that's quite a mouthful! That magic selector is long in that way because it needs a special syntax (specifically the .anarcat.user suffix) for Debian to be happy. The -debian string is to tell me where the key is published. The marcos prefix is to remind me where the private is used.
    3. generate the key with:
      opendkim-genkey --directory=/etc/opendkim/keys/ --selector=marcos-debian.anarcat.user --domain=debian.org --verbose
      
      This creates the DNS record in /etc/opendkim/keys/marcos-debian.anarcat.user.txt (alongside the private key in .key).
    4. restart OpenDKIM:
      service opendkim restart
      
      The DNS record will look something like this:
      marcos-debian.anarcat.user._domainkey   IN  TXT ( "v=DKIM1; h=sha256; k=rsa; "
      "p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtKzBK2f8vg5yV307WAOatOhypQt3ANQ95iDaewkVehmx42lZ6b4PzA1k5DkIarxjkk+7m6oSpx5H3egrUSLMirUiMGsIb5XVGBPFmKZhDVmC7F5G1SV7SRqqKZYrXTufRRSne1eEtA31xpMP0B32f6v6lkoIZwS07yQ7DDbwA9MHfyb6MkgAvDwNJ45H4cOcdlCt0AnTSVndcl"
      "pci5/2o/oKD05J9hxFTtlEblrhDXWRQR7pmthN8qg4WaNI4WszbB3Or4eBCxhUdvAt2NF9c9eYLQGf0jfRsbOcjSfeus0e2fpsKW7JMvFzX8+O5pWfSpRpdPatOt80yy0eqpm1uQIDAQAB" )  ; ----- DKIM key marcos-debian.anarcat.user for debian.org
      
    5. The "p=MIIB..." string needs to be joined together, without the quotes and the p=, and sent in a signed email to changes@db.debian.org:
      -----BEGIN PGP SIGNED MESSAGE-----
      dkimPubKey: marcos.anarcat.user MIIB[...]
      -----BEGIN PGP SIGNATURE-----
      [...]
      
    6. Wait a few minutes for DNS to propagate. You can check if they have with:
      host -t TXT marcos-debian.anarcat.user._domainkey.debian.org nsp.dnsnode.net
      
      (nsp.dnsnode.net being one of the NS records of the debian.org zone.)
    If all goes well, the tests should pass when sending from your server as anarcat@debian.org.

    Testing Test messages can be sent to dkimvalidator, mail-tester.com or check-auth@verifier.port25.com. Those tools will run Spamassassin on the received emails and report the results. What you are looking for is:
    • -0.1 DKIM_VALID: Message has at least one valid DKIM or DK signature
    • -0.1 DKIM_VALID_AU: Message has a valid DKIM or DK signature from author's domain
    • -0.1 DKIM_VALID_EF: Message has a valid DKIM or DK signature from envelope-from domain
    If one of those is missing, then you are doing something wrong and your "spamminess" score will be worse. The latter is especially tricky as it validates the "Envelope From", which is the MAIL FROM: header as sent by the originating MTA, which you see as from=<> in the postfix lost. The following will happen anyways, as soon as you have a signature, that's normal:
    • 0.1 DKIM_SIGNED: Message has a DKIM or DK signature, not necessarily valid
    And this might happen if you have a ADSP record but do not correctly sign the message with a domain field that matches the record:
    • 1.1 DKIM_ADSP_ALL No valid author signature, domain signs all mail
    That's bad and will affect your spam core badly. I fixed that issue by using a wildcard key in the key table:
    --- a/opendkim/key.table
    +++ b/opendkim/key.table
    @@ -1 +1 @@
    -marcos anarc.at:marcos:/etc/opendkim/keys/marcos.private
    +marcos %:marcos:/etc/opendkim/keys/marcos.private
    

    References This is a copy of a subset of my more complete email configuration.

    16 July 2017

    Jose M. Calhariz: Crossgrading a complex Desktop and Debian Developer machine running Debian 9

    This article is an experiment in progress, please recheck, while I am updating with the new information. I have a very old installation of Debian, possibly since v2, dot not remember, that I have upgraded since then both in software and hardware. Now the hardware is 64bits, runs a kernel of 64bits but the run-time is still 32bits. For 99% of tasks this is very good. Now that I have made many simulations I may have found a solution to do a crossgrade of my desktop. I write here the tentative procedure and I will update with more ideias on the problems that I may found. First you need to install a 64bits kernel and boot with it. See my previous post on how to do it. Second you need to do a bootstrap of crossgrading and the instalation of all the libs as amd64:
     apt-get update
     apt-get upgrade
     apt-get clean
     dpkg --list > original.dpkg
     apt-get --download-only install dpkg:amd64 tar:amd64 apt:amd64 bash:amd64 dash:amd64 init:amd64 mawk:amd64
     cd /var/cache/apt/archives/
     dpkg --install dpkg_*amd64.deb tar_*amd64.deb apt_*amd64.deb bash_*amd64.deb dash_*amd64.deb *.deb
     dpkg --configure --pending
     dpkg -i --skip-same-version dpkg_*_amd64.deb apt_*_amd64.deb bash_*_amd64.deb dash_*_amd64.deb mawk_*_amd64.deb *.deb
     
     for pack32 in $(grep i386 original.dpkg    egrep "^ii "   awk ' print $2 ' ) ; do 
       echo $pack32 ; 
       if dpkg --status $pack32   grep -q "Multi-Arch: same" ; then 
         apt-get --download-only install -y --allow-remove-essential $ pack32%:i386 :amd64 ; 
       fi ; 
     done
     dpkg --install /var/cache/apt/archives/*_amd64.deb
     dpkg --install /var/cache/apt/archives/*_amd64.deb
     dpkg --print-architecture
     dpkg --print-foreign-architectures
    
    But this procedure does not prevent the "apt-get install" to have broken dependencies. So trying to install the core packages and the libraries using "dpkg -i".
    apt-get update
    apt-get upgrade
    apt-get autoremove
    apt-get clean
    dpkg --list > original.dpkg
    apt-get --download-only install dpkg:amd64 tar:amd64 apt:amd64 bash:amd64 dash:amd64 init:amd64 mawk:amd64
    for pack32 in $(grep i386 original.dpkg   egrep "^ii "   awk ' print $2 ' ) ; do 
      echo $pack32 ; 
      if dpkg --status $pack32   grep -q "Multi-Arch: same" ; then 
        apt-get --download-only install -y --allow-remove-essential $ pack32%:i386 :amd64 ; 
      fi ; 
    done
    cd /var/cache/apt/archives/
    dpkg --install dpkg_*amd64.deb tar_*amd64.deb apt_*amd64.deb bash_*amd64.deb dash_*amd64.deb *.deb
    dpkg --configure --pending
    dpkg --install --skip-same-version dpkg_*_amd64.deb apt_*_amd64.deb bash_*_amd64.deb dash_*_amd64.deb mawk_*_amd64.deb *.deb
    dpkg --remove libcurl4-openssl-dev
    dpkg -i libcurl4-openssl-dev_*_amd64.deb
    
    Remove packages until all there is no brokens packages
    dpkg --print-architecture
    dpkg --print-foreign-architectures
    apt-get --fix-broken --allow-remove-essential install
    
    Still broken, because apt-get removed dpkg So instead of only installing the libs with dpkg -i, I am going to try to install all the packages with dpkg -i:
    apt-get update
    apt-get upgrade
    apt-get autoremove
    apt-get clean
    dpkg --list > original.dpkg
    apt-get --download-only install dpkg:amd64 tar:amd64 apt:amd64 bash:amd64 dash:amd64 init:amd64 mawk:amd64
    for pack32 in $(grep i386 original.dpkg   egrep "^ii "   awk ' print $2 ' ) ; do 
      echo $pack32 ; 
      apt-get --download-only install -y --allow-remove-essential $ pack32%:i386 :amd64 ; 
    done
    cd /var/cache/apt/archives/
    dpkg --install dpkg_*amd64.deb tar_*amd64.deb apt_*amd64.deb bash_*amd64.deb dash_*amd64.deb *.deb
    dpkg --configure --pending
    dpkg --install --skip-same-version dpkg_*_amd64.deb apt_*_amd64.deb bash_*_amd64.deb dash_*_amd64.deb mawk_*_amd64.deb *.deb
    dpkg --configure --pending
    
    Remove packages and reinstall selected packages until you fix all off them. Follow the trial for my machine:
    dpkg --remove rkhunter
    dpkg --remove libmarco-private1:i386 marco mate-control-center mate-desktop-environment-core mate-desktop-environment-core  mate-desktop-environment mate-desktop-environment-core mate-desktop-environment-extras
    dpkg --remove libmate-menu2:i386 libmate-window-settings1:i386 mate-panel mate-screensaver python-mate-menu libmate-slab0:i386 mozo mate-menus
    dpkg --remove libmate-menu2:i386 mate-panel python-mate-menu mate-applets mate-menus
    dpkg -i libmate-menu2_1.16.0-2_amd64.deb
    dpkg --remove  gir1.2-ibus-1.0:i386 gnome-shell gnome-shell-extensions gdm3 gnome-session
    dpkg --remove  gir1.2-ibus-1.0:i386
    dpkg --remove libmateweather1:i386
    dpkg -i libmateweather1_1.16.1-2_amd64.deb
    apt-get --fix-broken --download-only install
    dpkg --skip-same-version --install dpkg_*amd64.deb tar_*amd64.deb apt_*amd64.deb bash_*amd64.deb dash_*amd64.deb *.deb
    dpkg --configure --pending
    dpkg -i python_2.7.13-2_amd64.deb
    dpkg --configure --pending
    dpkg -i perl_5.24.1-3+deb9u1_amd64.deb perl-base_5.24.1-3+deb9u1_amd64.deb
    dpkg -i exim4-daemon-light_4.89-2+deb9u1_amd64.deb exim4-base_4.89-2+deb9u1_amd64.deb
    dpkg -i libuuid-perl_0.27-1_amd64.deb
    dpkg --configure --pending
    dpkg --install gstreamer1.0-plugins-bad_1.10.4-1_amd64.deb libmpeg2encpp-2.1-0_1%3a2.1.0+debian-5_amd64.deb libmplex2-2.1-0_1%3a2.1.0+debian-5_amd64.deb
    dpkg --configure --pending
    dpkg --audit
    
    Now fixing broken dependencies on apt-get. Found no other way than removing all the broken packages.
    dpkg --remove $(apt-get --fix-broken install   cut -f 2 -d ' ' )
    apt-get install $(grep -v ":i386" ~/original.dpkg   egrep "^ii"   grep -v "aiccu"   grep -v "acroread"   grep -v "flash-player-properties"   grep -v "flashplayer-mozilla"   egrep -v "tp-flash-marillat"   awk ' print $2 ')
    

    7 December 2016

    Jonas Meurer: On CVE-2016-4484, a (securiy)? bug in the cryptsetup initramfs integration

    On CVE-2016-4484, a (security)? bug in the cryptsetup initramfs integration On November 4, I was made aware of a security vulnerability in the integration of cryptsetup into initramfs. The vulnerability was discovered by security researchers Hector Marco and Ismael Ripoll of CyberSecurity UPV Research Group and got CVE-2016-4484 assigned. In this post I'll try to reflect a bit on

    What CVE-2016-4484 is all about Basically, the vulnerability is about two separate but related issues:

    1. Initramfs rescue shell considered harmful The main topic that Hector Marco and Ismael Ripoll address in their publication is that Debian exits into a rescue shell in case of failure during initramfs, and that this can be triggered by entering a wrong password ~93 times in a row. Indeed the Debian initramfs implementation as provided by initramfs-tools exits into a rescue shell (usually a busybox shell) after a defined amount of failed attempts to make the root filesystem available. The loop in question is in local_device_setup() at the local initramfs script In general, this behaviour is considered as a feature: if the root device hasn't shown up after 30 rounds, the rescue shell is spawned to provide the local user/admin a way to debug and fix things herself. Hector Marco and Ismael Ripoll argue that in special environments, e.g. on public computers with password protected BIOS/UEFI and bootloader, this opens an attack vector and needs to be regarded as a security vulnerability:
    It is common to assume that once the attacker has physical access to the computer, the game is over. The attackers can do whatever they want. And although this was true 30 years ago, today it is not. There are many "levels" of physical access. [...] In order to protect the computer in these scenarios: the BIOS/UEFI has one or two passwords to protect the booting or the configuration menu; the GRUB also has the possibility to use multiple passwords to protect unauthorized operations. And in the case of an encrypted system, the initrd shall block the maximum number of password trials and prevent the access to the computer in that case.
    While Hector and Ismael have a valid point in that the rescue shell might open an additional attack vector in special setups, this is not true for the vast majority of Debian systems out there: in most cases a local attacker can alter the boot order, replace or add boot devices, modify boot options in the (GNU GRUB) bootloader menu or modify/replace arbitrary hardware parts. The required scenario to make the initramfs rescue shell an additional attack vector is indeed very special: locked down hardware, password protected BIOS and bootloader but still local keyboard (or serial console) access are required at least. Hector and Ismael argue that the default should be changed for enhanced security:
    [...] But then Linux is used in more hostile environments, this helpful (but naive) recovery services shall not be the default option.
    For the reasons explained about, I tend to disagree to Hectors and Ismaels opinion here. And after discussing this topic with several people I find my opinion reconfirmed: the Debian Security Team disputes the security impact of the issue and others agree. But leaving the disputable opinion on a sane default aside, I don't think that the cryptsetup package is the right place to change the default, if at all. If you want added security by a locked down initramfs (i.e. no rescue shell spawned), then at least the bootloader (GNU GRUB) needs to be locked down by default as well. To make it clear: if one wants to lock down the boot process, bootloader and initramfs should be locked down together. And the right place to do this would be the configurable behaviour of grub-mkconfig. Here, one can set a password for GRUB and the boot parameter 'panic=1' which disables the spawning of a rescue shell in initramfs. But as mentioned, I don't agree that this would be sane defaults. The vast majority of Debian systems out there don't have any security added by locked down bootloader and initramfs and the benefit of a rescue shell for debugging purposes clearly outrivals the minor security impact in my opinion. For the few setups which require the added security of a locked down bootloader and initramfs, we already have the relevant options documented in the Securing Debian Manual: After discussing the topic with initramfs-tools maintainers today, Guilhem and me (the cryptsetup maintainers) finally decided to not change any defaults and just add a 'sleep 60' after the maximum allowed attempts were reached. 2. tries=n option ignored, local brute-force slightly cheaper Apart from the issue of a rescue shell being spawned, Hector and Ismael also discovered a programming bug in the cryptsetup initramfs integration. This bug in the cryptroot initramfs local-top script allowed endless retries of passphrase input, ignoring the tries=n option of crypttab (and the default of 3). As a result, theoretically unlimited attempts to unlock encrypted disks were possible when processed during initramfs stage. The attack vector here was that local brute-force attacks are a bit cheaper. Instead of having to reboot after max tries were reached, one could go on trying passwords. Even though efficient brute-force attacks are mitigated by the PBKDF2 implementation in cryptsetup, this clearly is a real bug. The reason for the bug was twofold:
    • First, the condition in setup_mapping() responsible for making the function fail when the maximum amount of allowed attempts is reached, was never met:
      setup_mapping()
       
        [...]
        # Try to get a satisfactory password $crypttries times
        count=0                              
      while [ $crypttries -le 0 ] [ $count -lt $crypttries ]; do export CRYPTTAB_TRIED="$count" count=$(( $count + 1 )) [...] done if [ $crypttries -gt 0 ] && [ $count -gt $crypttries ]; then message "cryptsetup: maximum number of tries exceeded for $crypttarget" return 1 fi [...]
      As one can see, the while loop stops when $count -lt $crypttries. Thus the second condition $count -gt $crypttries is never met. This can easily be fixed by decreasing $count by one in case of a successful unlock attempt along with changing the second condition to $count -ge $crypttries:
      setup_mapping()
       
        [...]
        while [ $crypttries -le 0 ]   [ $count -lt $crypttries ]; do
            [...]
            # decrease $count by 1, apparently last try was successful.
            count=$(( $count - 1 ))
            [...]
        done
        if [ $crypttries -gt 0 ] && [ $count -ge $crypttries ]; then
            [...]
        fi
        [...]
       
      
      Christian Lamparter already spotted this bug back in October 2011 and provided a (incomplete) patch, but back then I even managed to merge the patch in an improper way, making it even more useless: The patch by Christian forgot to decrease $count by one in case of a successful unlock attempt, resulting in warnings about maximum tries exceeded even for successful attemps in some circumstances. But instead of adding the decrease myself and keeping the (almost correct) condition $count -eq $crypttries for detection of exceeded maximum tries, I changed back the condition to the wrong original $count -gt $crypttries that again was never met. Apparently I didn't test the fix properly back then. I definitely should do better in future!
    • Second, back in December 2013, I added a cryptroot initramfs local-block script as suggested by Goswin von Brederlow in order to fix bug #678692. The purpose of the cryptroot initramfs local-block script is to invoke the cryptroot initramfs local-top script again and again in a loop. This is required to support complex block device stacks. In fact, the numberless options of stacked block devices are one of the biggest and most inglorious reasons that the cryptsetup initramfs integration scripts became so complex over the years. After all we need to support setups like rootfs on top of LVM with two separate encrypted PVs or rootfs on top of LVM on top of dm-crypt on top of MD raid. The problem with the local-block script is that exiting the setup_mapping() function merely triggers a new invocation of the very same function. The guys who discovered the bug suggested a simple and good solution to this bug: When maximum attempts are detected (by second condition from above), the script sleeps for 60 seconds. This mitigates the brute-force attack options for local attackers - even rebooting after max attempts should be faster.

    About disclosure, wording and clickbaiting I'm happy that Hector and Ismael brought up the topic and made their argument about the security impacts of an initramfs rescue shell, even though I have to admit that I was rather astonished about the fact that they got a CVE assigned. Nevertheless I'm very happy that they informed the Security Teams of Debian and Ubuntu prior to publishing their findings, which put me in the loop in turn. Also Hector and Ismael were open and responsive when it came to discussing their proposed fixes. But unfortunately the way they advertised their finding was not very helpful. They announced a speech about this topic at the DeepSec 2016 in Vienna with the headline Abusing LUKS to Hack the System. Honestly, this headline is missleading - if not wrong - in several ways:
    • First, the whole issue is not about LUKS, neither is it about cryptsetup itself. It's about Debians integration of cryptsetup into the initramfs, which is a compeletely different story.
    • Second, the term hack the system suggests that an exploit to break into the system is revealed. This is not true. The device encryption is not endangered at all.
    • Third - as shown above - very special prerequisites need to be met in order to make the mere existance of a LUKS encrypted device the relevant fact to be able to spawn a rescue shell during initramfs.
    Unfortunately, the way this issue was published lead to even worse articles in the tech news press. Topics like Major security hole found in Cryptsetup script for LUKS disk encryption or Linux Flaw allows Root Shell During Boot-Up for LUKS Disk-Encrypted Systems suggest that a major security vulnerabilty was revealed and that it compromised the protection that cryptsetup respective LUKS offer. If these articles/news did anything at all, then it was causing damage to the cryptsetup project, which is not affected by the whole issue at all. After the cat was out of the bag, Marco and Ismael aggreed that the way the news picked up the issue was suboptimal, but I cannot fight the feeling that the over-exaggeration was partly intended and that clickbaiting is taking place here. That's a bit sad.

    20 October 2016

    Daniel Pocock: Choosing smartcards, readers and hardware for the Outreachy project

    One of the projects proposed for this round of Outreachy is the PGP / PKI Clean Room live image. Interns, and anybody who decides to start using the project (it is already functional for command line users) need to decide about purchasing various pieces of hardware, including a smart card, a smart card reader and a suitably secure computer to run the clean room image. It may also be desirable to purchase some additional accessories, such as a hardware random number generator. If you have any specific suggestions for hardware or can help arrange any donations of hardware for Outreachy interns, please come and join us in the pki-clean-room mailing list or consider adding ideas on the PGP / PKI clean room wiki. Choice of smart card For standard PGP use, the OpenPGP card provides a good choice. For X.509 use cases, such as VPN access, there are a range of choices. I recently obtained one of the SmartCard HSM cards, Card Contact were kind enough to provide me with a free sample. An interesting feature of this card is Elliptic Curve (ECC) support. More potential cards are listed on the OpenSC page here. Choice of card reader The technical factors to consider are most easily explained with a table:
    On disk Smartcard reader without PIN-pad Smartcard reader with PIN-pad
    Software Free/open Mostly free/open, Proprietary firmware in reader
    Key extraction Possible Not generally possible
    Passphrase compromise attack vectors Hardware or software keyloggers, phishing, user error (unsophisticated attackers) Exploiting firmware bugs over USB (only sophisticated attackers)
    Other factors No hardware Small, USB key form-factor Largest form factor
    Some are shortlisted on the GnuPG wiki and there has been recent discussion of that list on the GnuPG-users mailing list. Choice of computer to run the clean room environment There are a wide array of devices to choose from. Here are some principles that come to mind:
    • Prefer devices without any built-in wireless communications interfaces, or where those interfaces can be removed
    • Even better if there is no wired networking either
    • Particularly concerned users may also want to avoid devices with opaque micro-code/firmware
    • Small devices (laptops) that can be stored away easily in a locked cabinet or safe to prevent tampering
    • No hard disks required
    • Having built-in SD card readers or the ability to add them easily
    SD cards and SD card readers The SD cards are used to store the master private key, used to sign the certificates/keys on the smart cards. Multiple copies are kept. It is a good idea to use SD cards from different vendors, preferably not manufactured in the same batch, to minimize the risk that they all fail at the same time. For convenience, it would be desirable to use a multi-card reader: although the software experience will be much the same if lots of individual card readers or USB flash drives are used. Other devices One additional idea that comes to mind is a hardware random number generator (TRNG), such as the FST-01. Can you help with ideas or donations? If you have any specific suggestions for hardware or can help arrange any donations of hardware for Outreachy interns, please come and join us in the pki-clean-room mailing list or consider adding ideas on the PGP / PKI clean room wiki.

    4 October 2016

    Rapha&#235;l Hertzog: My Free Software Activities in September 2016

    My monthly report covers a large part of what I have been doing in the free software world. I write it for my donators (thanks to them!) but also for the wider Debian community because it can give ideas to newcomers and it s one of the best ways to find volunteers to work with me on projects that matter to me. Debian LTS With the increasing number of paid contributors, easy fixes (CVE with patches available) tend to be processed rather quickly. All the package I worked on had issues that were open for a long time because they were hard to handle. I prepared DLA-613-1 fixing 3 CVE on roundcube. The fix required to manually backport the CRSF handling code which was not available in the wheezy version. I spent almost 8 hours on roundcube. Then I started to work on tiff3. I reviewed many CVE: CVE-2016-3658, CVE-2015-7313, CVE-2015-7554, CVE-2015-8668, CVE-2016-5318, CVE-2016-3625, CVE-2016-5319. I updated their status for tiff3 in wheezy, requested reproducer files to people who reported the CVE when the files were not publicly available and made sure that everything was recorded in the upstream bug tracker. The 4.25 hours I spent on the package were not enough to work on patches, so I put the package back in the work queue. GNOME 3.22 transition I uploaded a new gnome-shell-timer that would work with GNOME 3.21 that had been uploaded to sid. Unfortunately, that new GNOME (and GTK+) version caused many regressions that affected Debian Testing (and thus Kali) users in particular in gnome-control-center. I uploaded a new version fixing some of those issues and I reported a bunch of them to upstream too (#771515, #771517, #771696). Kali I worked on #836211 creating a dpkg patch to work-around the overlayfs limitation (we use it in Kali because persistence of live system relies on overlayfs) and I contacted the upstream overlayfs maintainer to hopefully get a proper fix on the overlayfs side instead. I uploaded radcli 1.2.6-2.1 to fix RC bug #825121 as the package was removed from testing and openvas depends on it in Kali. As part of the pkg-security team, I sponsored/uploaded acccheck and arp-scan for Marcos Fouces, and p0f 3.09b as well. Misc Debian work Distro Tracker. I tested, fixed and merged Paul Wise s patch integrating multiarch hints into tracker.debian.org (#833623). Debian Handbook. I enabled the new Vietnamese translation on debian-handbook.info and updated all translations with Weblate updates. systemd units for apache2. I prepared systemd units for apache2 which I submitted in #798430. With approval of Stefan Fritsch, I committed my work to the git repository and then uploaded the result in version 2.4.23-5. Hindsight packaging. I first packaged lua-sandbox (#838969) which is a dependency of Hindsight and then Hindsight itself (#838968). In this process, I opened a couple of upstream tickets. PIE by default. I uploaded a new version of cpputest compiled with -fPIC so shat executable linking to its static library can be compiled with -fPIE (#837363, forwarded upstream here). Bugs filed. Bad homepage link in haskell-dice-entropy-conduit. Inconsistent options --onlyscripts and --noscripts in debhelper. pidgin entry in security-support-limited is out of date in debian-security-support. New upstream version (2.0.2) in puppet-lint. Thanks See you next month for a new summary of my activities.

    No comment Liked this article? Click here. My blog is Flattr-enabled.

    1 October 2016

    Kees Cook: security things in Linux v4.6

    Previously: v4.5. The v4.6 Linux kernel release included a bunch of stuff, with much more of it under the KSPP umbrella. seccomp support for parisc Helge Deller added seccomp support for parisc, which including plumbing support for PTRACE_GETREGSET to get the self-tests working. x86 32-bit mmap ASLR vs unlimited stack fixed Hector Marco-Gisbert removed a long-standing limitation to mmap ASLR on 32-bit x86, where setting an unlimited stack (e.g. ulimit -s unlimited ) would turn off mmap ASLR (which provided a way to bypass ASLR when executing setuid processes). Given that ASLR entropy can now be controlled directly (see the v4.5 post), and that the cases where this created an actual problem are very rare, means that if a system sees collisions between unlimited stack and mmap ASLR, they can just adjust the 32-bit ASLR entropy instead. x86 execute-only memory Dave Hansen added Protection Key support for future x86 CPUs and, as part of this, implemented support for execute only memory in user-space. On pkeys-supporting CPUs, using mmap(..., PROT_EXEC) (i.e. without PROT_READ) will mean that the memory can be executed but cannot be read (or written). This provides some mitigation against automated ROP gadget finding where an executable is read out of memory to find places that can be used to build a malicious execution path. Using this will require changing some linker behavior (to avoid putting data in executable areas), but seems to otherwise Just Work. I m looking forward to either emulated QEmu support or access to one of these fancy CPUs. CONFIG_DEBUG_RODATA enabled by default on arm and arm64, and mandatory on x86 Ard Biesheuvel (arm64) and I (arm) made the poorly-named CONFIG_DEBUG_RODATA enabled by default. This feature controls whether the kernel enforces proper memory protections on its own memory regions (code memory is executable and read-only, read-only data is actually read-only and non-executable, and writable data is non-executable). This protection is a fundamental security primitive for kernel self-protection, so making it on-by-default is required to start any kind of attack surface reduction within the kernel. On x86 CONFIG_DEBUG_RODATA was already enabled by default, but, at Ingo Molnar s suggestion, I made it mandatory: CONFIG_DEBUG_RODATA cannot be turned off on x86. I expect we ll get there with arm and arm64 too, but the protection is still somewhat new on these architectures, so it s reasonable to continue to leave an out for developers that find themselves tripping over it. arm64 KASLR text base offset Ard Biesheuvel reworked a ton of arm64 infrastructure to support kernel relocation and, building on that, Kernel Address Space Layout Randomization of the kernel text base offset (and module base offset). As with x86 text base KASLR, this is a probabilistic defense that raises the bar for kernel attacks where finding the KASLR offset must be added to the chain of exploits used for a successful attack. One big difference from x86 is that the entropy for the KASLR must come either from Device Tree (in the /chosen/kaslr-seed property) or from UEFI (via EFI_RNG_PROTOCOL), so if you re building arm64 devices, make sure you have a strong source of early-boot entropy that you can expose through your boot-firmware or boot-loader. zero-poison after free Laura Abbott reworked a bunch of the kernel memory management debugging code to add zeroing of freed memory, similar to PaX/Grsecurity s PAX_MEMORY_SANITIZE feature. This feature means that memory is cleared at free, wiping any sensitive data so it doesn t have an opportunity to leak in various ways (e.g. accidentally uninitialized structures or padding), and that certain types of use-after-free flaws cannot be exploited since the memory has been wiped. To take things even a step further, the poisoning can be verified at allocation time to make sure that nothing wrote to it between free and allocation (called sanity checking ), which can catch another small subset of flaws. To understand the pieces of this, it s worth describing that the kernel s higher level allocator, the page allocator (e.g. __get_free_pages()) is used by the finer-grained slab allocator (e.g. kmem_cache_alloc(), kmalloc()). Poisoning is handled separately in both allocators. The zero-poisoning happens at the page allocator level. Since the slab allocators tend to do their own allocation/freeing, their poisoning happens separately (since on slab free nothing has been freed up to the page allocator). Only limited performance tuning has been done, so the penalty is rather high at the moment, at about 9% when doing a kernel build workload. Future work will include some exclusion of frequently-freed caches (similar to PAX_MEMORY_SANITIZE), and making the options entirely CONFIG controlled (right now both CONFIGs are needed to build in the code, and a kernel command line is needed to activate it). Performing the sanity checking (mentioned above) adds another roughly 3% penalty. In the general case (and once the performance of the poisoning is improved), the security value of the sanity checking isn t worth the performance trade-off. Tests for the features can be found in lkdtm as READ_AFTER_FREE and READ_BUDDY_AFTER_FREE. If you re feeling especially paranoid and have enabled sanity-checking, WRITE_AFTER_FREE and WRITE_BUDDY_AFTER_FREE can test these as well. To perform zero-poisoning of page allocations and (currently non-zero) poisoning of slab allocations, build with:
    CONFIG_DEBUG_PAGEALLOC=n
    CONFIG_PAGE_POISONING=y
    CONFIG_PAGE_POISONING_NO_SANITY=y
    CONFIG_PAGE_POISONING_ZERO=y
    CONFIG_SLUB_DEBUG=y
    and enable the page allocator poisoning and slab allocator poisoning at boot with this on the kernel command line:
    page_poison=on slub_debug=P
    To add sanity-checking, change PAGE_POISONING_NO_SANITY=n, and add F to slub_debug as slub_debug=PF . read-only after init I added the infrastructure to support making certain kernel memory read-only after kernel initialization (inspired by a small part of PaX/Grsecurity s KERNEXEC functionality). The goal is to continue to reduce the attack surface within the kernel by making even more of the memory, especially function pointer tables, read-only (which depends on CONFIG_DEBUG_RODATA above). Function pointer tables (and similar structures) are frequently targeted by attackers when redirecting execution. While many are already declared const in the kernel source code, making them read-only (and therefore unavailable to attackers) for their entire lifetime, there is a class of variables that get initialized during kernel (and module) start-up (i.e. written to during functions that are marked __init ) and then never (intentionally) written to again. Some examples are things like the VDSO, vector tables, arch-specific callbacks, etc. As it turns out, most architectures with kernel memory protection already delay making their data read-only until after __init (see mark_rodata_ro()), so it s trivial to declare a new data section ( .data..ro_after_init ) and add it to the existing read-only data section ( .rodata ). Kernel structures can be annotated with the new section (via the __ro_after_init macro), and they ll become read-only once boot has finished. The next step for attack surface reduction infrastructure will be to create a kernel memory region that is passively read-only, but can be made temporarily writable (by a single un-preemptable CPU), for storing sensitive structures that are written to only very rarely. Once this is done, much more of the kernel s attack surface can be made read-only for the majority of its lifetime. As people identify places where __ro_after_init can be used, we can grow the protection. A good place to start is to look through the PaX/Grsecurity patch to find uses of __read_only on variables that are only written to during __init functions. The rest are places that will need the temporarily-writable infrastructure (PaX/Grsecurity uses pax_open_kernel()/pax_close_kernel() for these). That s it for v4.6, next up will be v4.7!

    2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
    Creative Commons License

    17 August 2016

    Gunnar Wolf: Talking about the Debian keyring in Investigaciones Nucleares, UNAM

    For the readers of my blog that happen to be in Mexico City, I was invited to give a talk at Instituto de Ciencias Nucleares, Ciudad Universitaria, UNAM.

    I will be at Auditorio Marcos Moshinsky, on August 26 starting at 13:00. Auditorio Marcos Moshinsky is where we met for the early (~1996-1997) Mexico Linux User Group meetings. And... Wow. I'm amazed to realize it's been twenty years that I arrived there, young and innocent, the newest of what looked like a sect obsessed with world domination and a penguin fetish.
    AttachmentSize
    llavero_chico.png220.84 KB
    llavero_orig.png1.64 MB

    Next.

    Previous.